class: center, middle, inverse, title-slide # Geospatial Techniques for Social Scientists in R ## Vector Data ### Stefan Jünger & Anne-Kathrin Stroppe
February 08, 2021 --- layout: true --- ## Now <table class="table" style="margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> Day </th> <th style="text-align:left;"> Time </th> <th style="text-align:left;"> Title </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;color: gray !important;"> February 08 </td> <td style="text-align:left;color: gray !important;"> 09:00am-10:30am </td> <td style="text-align:left;font-weight: bold;"> Introduction </td> </tr> <tr> <td style="text-align:left;color: gray !important;background-color: yellow !important;"> February 08 </td> <td style="text-align:left;color: gray !important;background-color: yellow !important;"> 11:00am-12:30pm </td> <td style="text-align:left;font-weight: bold;background-color: yellow !important;"> Vector Data </td> </tr> <tr> <td style="text-align:left;color: gray !important;color: gray !important;"> February 08 </td> <td style="text-align:left;color: gray !important;color: gray !important;"> 12:30pm-01:30pm </td> <td style="text-align:left;font-weight: bold;color: gray !important;"> Lunch Break </td> </tr> <tr> <td style="text-align:left;color: gray !important;"> February 08 </td> <td style="text-align:left;color: gray !important;"> 01:30pm-03:00pm </td> <td style="text-align:left;font-weight: bold;"> Basic Maps </td> </tr> <tr> <td style="text-align:left;color: gray !important;border-bottom: 1px solid"> February 08 </td> <td style="text-align:left;color: gray !important;border-bottom: 1px solid"> 03:30pm-05:00pm </td> <td style="text-align:left;font-weight: bold;border-bottom: 1px solid"> Raster Data </td> </tr> <tr> <td style="text-align:left;color: gray !important;"> February 09 </td> <td style="text-align:left;color: gray !important;"> 09:00am-10:30am </td> <td style="text-align:left;font-weight: bold;"> Advanced Data Import </td> </tr> <tr> <td style="text-align:left;color: gray !important;"> February 09 </td> <td style="text-align:left;color: gray !important;"> 11:00am-12:30pm </td> <td style="text-align:left;font-weight: bold;"> Applied Data Wrangling </td> </tr> <tr> <td style="text-align:left;color: gray !important;color: gray !important;"> February 09 </td> <td style="text-align:left;color: gray !important;color: gray !important;"> 12:30pm-13:30pm </td> <td style="text-align:left;font-weight: bold;color: gray !important;"> Lunch Break </td> </tr> <tr> <td style="text-align:left;color: gray !important;"> February 09 </td> <td style="text-align:left;color: gray !important;"> 01:30pm-03:00pm </td> <td style="text-align:left;font-weight: bold;"> Advanced Maps I </td> </tr> <tr> <td style="text-align:left;color: gray !important;"> February 09 </td> <td style="text-align:left;color: gray !important;"> 03:30pm-05:00pm </td> <td style="text-align:left;font-weight: bold;"> Advanced Maps II </td> </tr> </tbody> </table> --- ## Why Care About Data Types and Formats? There is a difference how spatial information is stored, processed and visually represented. What does this mean for us? - Different commands to import and handle the data - Spatial linking techniques and analyses partially determined by data format - Visualization of data can differ So: Always know what kind of data you are dealing with! --- ## Representing the World in Vectors .pull-left[ .center[ <img src="data:image/png;base64,#1_2_Vector_Data_files/figure-html/world-cities-1.png" width="120%" style="display: block; margin: auto;" /> ] ] .pull-right[ In a nutshell, the surface of earth is represented by simple geometries and attributes assigned to each of these features. What usually comes as vector data: - administrative borders - rivers and roads, - buildings, cities and more The basis for vector data are georeferenced coordinates. Every object is defined by longitude (x) and latitude (y). ] --- ## Vector Data: Geometries .pull-left[ Each real world features is represented as one of three types of geometries: - Points: discrete location (f.e., well) - Lines: linear feature (f.e., river) - Polygons: enclosed area (f.e., lake) However, the geometries are not fixed. A city might be represented as a point on the map of the world, but for an enclosed look take the shape of a polygon. ] .pull-right[ <img src="data:image/png;base64,#./img/vector_geometries.png" width="90%" style="display: block; margin: auto;" /> <br> <small><small><small> National Ecological Observatory Network (NEON), cited by [Datacarpentry](https://datacarpentry.org/organization-geospatial/02-intro-vector-data/)</small></small></small> ] --- ## Vector Data: Attribute Tables The geometries of vector data hold firstly and foremost only up to three information: - location (points, lines and polygons) - length (lines and polygons) - area (polygons) We need to assign attributes to each geometry to hold additional information. Thereby, attribute tables correspond to data tables. Each row represents a geometric object, which we can also call observation or case. Each column holds an attribute or in "our" language a variable. --- ## Vector Data: Attribute Tables .center[ <img src="data:image/png;base64,#./img/attr_table.png" width="90%" style="display: block; margin: auto;" /> ] --- ## New Best Friend: Shapefiles Both, the geometric information and attribute table, can be saved within one file. Most commonly, we use ESRI Shapefiles to store vector data. They hold the information on geometric type, there location, and coordinate reference system but also store the attributes. Even though we import only one shapefile, they consists of at least three mandatory files with the extensions: - .shp : shape format - .shx : shape index format - .dbf : attribute format - (.prj: CRS/Projection) You don't have to remember what they are standing for, but know that you won't be able to load the data if one of those files is missing. --- ## First Summary 1. Vector Data come in three different geometric types: points, lines, polygons. 2. The geometric information are joined by an attribute table holding information for each geometric object. 3. Attribute tables can be treated as data tables. 4. Everything is stored in a Shapefile. Let's give this some context and get to know vector data in R! --- ## Datas Sets In the folder called `data` in the same folder as the other materials for this workshop, you can find the data files we prepped for all the exercises and slides. Please make sure that if you reuse any of the provided data to cite the original data sources. You will find information on the data sources in the respective folders. The first data sets in use are provided by the German Federal Agency for Cartography and Geodesy [http://www.bkg.bund.de). They offer free and high quality data and have an [Open Data Portal](https://gdz.bkg.bund.de/index.php/default/open-data.html). --- ## Welcome to `simple features` .pull-left[ .small[ There are several packages out there to wrangle and visualize spatial and, especially, vector data within `R`. We are fans of a package called ``sf` ("simple features"). Why? `simple features` refers to a formal standard to represent spatial geometries and supports interfaces to other programming languages and GIS systems. As a bonus, it allows us to work in a `tibble` environment. You might come across other packages, like `sp`, but we advice to focus on `simple features` whenever possible. ] ] .pull-right[ <img src="data:image/png;base64,#./img/sf.jpg" width="1600" style="display: block; margin: auto;" /> <small><small>Illustration by [Allison Horst](https://github.com/allisonhorst/stats-illustrations) </small></small> ] --- ## Load a shapefile The first step is, of course, loading the data. Here, we want to load the shapefile with the layer of the administrative borders of the German states (*Bundesländer*) called `GER_STATES`. ```r # load library library(sf) # define at least the data source name (dsn) # and the layer name german_states <- st_read(dsn = "./data", layer = "GER_STATES") ``` ``` ## Reading layer `GER_STATES' from data source `C:\Users\annes\Documents\gesis-workshop-geor\data' using driver `ESRI Shapefile' ## Simple feature collection with 35 features and 25 fields ## geometry type: MULTIPOLYGON ## dimension: XY ## bbox: xmin: 5.86625 ymin: 47.27012 xmax: 15.04182 ymax: 55.09917 ## geographic CRS: ETRS89 ``` --- ## Inspect your data: Classics Let's have a quick look at the imported data. Like with every other data set, we inspect the data to check some metadata and see if the importing worked correctly. ```r # object type class(german_states) ``` ``` ## [1] "sf" "data.frame" ``` ```r # number of rows nrow(german_states) ``` ``` ## [1] 35 ``` ```r # number of columns ncol(german_states) ``` ``` ## [1] 26 ``` --- ## Inspect your data: Classics ```r # head of data table head(german_states) ``` ``` ## Simple feature collection with 6 features and 25 fields ## geometry type: MULTIPOLYGON ## dimension: XY ## bbox: xmin: 5.86625 ymin: 49.39485 xmax: 11.59808 ymax: 55.05838 ## geographic CRS: ETRS89 ## ADE GF BSG RS AGS SDV_RS GEN BEZ IBZ ## 1 2 4 1 01 01 010020000000 Schleswig-Holstein Land 20 ## 2 2 4 1 02 02 020000000000 Hamburg Freie und Hansestadt 22 ## 3 2 4 1 03 03 032410001001 Niedersachsen Land 20 ## 4 2 4 1 04 04 040110000000 Bremen Freie Hansestadt 23 ## 5 2 4 1 05 05 051110000000 Nordrhein-Westfalen Land 20 ## 6 2 4 1 06 06 064140000000 Hessen Land 20 ## BEM NBD SN_L SN_R SN_K SN_V1 SN_V2 SN_G FK_S3 NUTS RS_0 AGS_0 ## 1 -- ja 01 0 00 00 00 000 0 DEF 010000000000 01000000 ## 2 -- ja 02 0 00 00 00 000 0 DE6 020000000000 02000000 ## 3 -- ja 03 0 00 00 00 000 0 DE9 030000000000 03000000 ## 4 -- ja 04 0 00 00 00 000 0 DE5 040000000000 04000000 ## 5 -- ja 05 0 00 00 00 000 0 DEA 050000000000 05000000 ## 6 -- ja 06 0 00 00 00 000 0 DE7 060000000000 06000000 ## WSK EWZ KFL DEBKG_ID geometry ## 1 2012-02-01 2889821 15804.35 DEBKGDL200000029 MULTIPOLYGON (((8.345917 54... ## 2 1974-01-01 1830584 755.09 DEBKGDL20000E6GD MULTIPOLYGON (((9.731356 53... ## 3 2015-01-01 7962775 47709.80 DEBKGDL20000E6EW MULTIPOLYGON (((9.773735 53... ## 4 2010-01-01 681032 419.41 DEBKGDL20000E0SF MULTIPOLYGON (((8.616398 53... ## 5 2009-11-01 17912134 34112.32 DEBKGDL20000E6GR MULTIPOLYGON (((6.205109 50... ## 6 2015-01-01 6243262 21115.71 DEBKGDL20000E3LK MULTIPOLYGON (((8.680587 49... ``` --- ## Inspect your data: Spatial features Besides our general data inspection, we also want to check the spatial features of our import. This includes the geometric type (points? lines? polygons?) and the coordinate reference system. You can see that there is no big differences between the simple feature data frame, we just imported and a regular data table. ```r # type of geometry st_geometry(german_states) ``` ``` ## Geometry set for 35 features ## geometry type: MULTIPOLYGON ## dimension: XY ## bbox: xmin: 5.86625 ymin: 47.27012 xmax: 15.04182 ymax: 55.09917 ## geographic CRS: ETRS89 ## First 5 geometries: ``` --- ## Inspect your data: Spatial features Since we have a polygon, the geometry table stores a list with two columns of latitude and longitude information. The points build the basis for ech polygon. ```r # the simple features column attr(german_states, "sf_column") ``` ``` ## [1] "geometry" ``` ```r # further inspecting glimpse(german_states$geometry) ``` ``` ## sfc_MULTIPOLYGON of length 35; first list element: List of 29 ## $ :List of 1 ## ..$ : num [1:20, 1:2] 8.35 8.35 8.35 8.35 8.35 ... ## $ :List of 1 ## ..$ : num [1:10, 1:2] 9.7 9.7 9.7 9.69 9.69 ... ## $ :List of 1 ## ..$ : num [1:33, 1:2] 9.56 9.56 9.57 9.57 9.57 ... ## $ :List of 1 ## ..$ : num [1:28, 1:2] 11.2 11.2 11.2 11.2 11.2 ... ## $ :List of 1 ## ..$ : num [1:34, 1:2] 8.36 8.36 8.37 8.37 8.37 ... ## $ :List of 1 ## ..$ : num [1:14, 1:2] 9.72 9.72 9.72 9.72 9.73 ... ## $ :List of 1 ## ..$ : num [1:23, 1:2] 8.51 8.51 8.51 8.51 8.51 ... ## $ :List of 1 ## ..$ : num [1:30, 1:2] 11.1 11.1 11.1 11.1 11.1 ... ## $ :List of 1 ## ..$ : num [1:34, 1:2] 8.77 8.77 8.77 8.77 8.77 ... ## $ :List of 1 ## ..$ : num [1:47, 1:2] 9.55 9.55 9.55 9.55 9.55 ... ## $ :List of 1 ## ..$ : num [1:46, 1:2] 8.73 8.73 8.73 8.73 8.73 ... ## $ :List of 1 ## ..$ : num [1:38, 1:2] 8.55 8.55 8.55 8.55 8.55 ... ## $ :List of 1 ## ..$ : num [1:54, 1:2] 7.92 7.92 7.92 7.92 7.92 ... ## $ :List of 1 ## ..$ : num [1:35, 1:2] 8.82 8.83 8.83 8.83 8.83 ... ## $ :List of 1 ## ..$ : num [1:61, 1:2] 7.88 7.89 7.89 7.89 7.89 ... ## $ :List of 1 ## ..$ : num [1:66, 1:2] 8.7 8.7 8.7 8.71 8.71 ... ## $ :List of 1 ## ..$ : num [1:48, 1:2] 9.42 9.42 9.42 9.41 9.41 ... ## $ :List of 1 ## ..$ : num [1:72, 1:2] 8.82 8.82 8.82 8.82 8.82 ... ## $ :List of 1 ## ..$ : num [1:68, 1:2] 8.74 8.74 8.74 8.74 8.74 ... ## $ :List of 1 ## ..$ : num [1:87, 1:2] 8.69 8.69 8.69 8.69 8.69 ... ## $ :List of 1 ## ..$ : num [1:199, 1:2] 9.56 9.56 9.56 9.55 9.55 ... ## $ :List of 1 ## ..$ : num [1:198, 1:2] 9.53 9.53 9.52 9.52 9.52 ... ## $ :List of 1 ## ..$ : num [1:133, 1:2] 8.52 8.52 8.52 8.52 8.52 ... ## $ :List of 1 ## ..$ : num [1:265, 1:2] 8.66 8.65 8.65 8.65 8.65 ... ## $ :List of 1 ## ..$ : num [1:339, 1:2] 8.34 8.34 8.34 8.34 8.34 ... ## $ :List of 1 ## ..$ : num [1:190, 1:2] 8.71 8.7 8.71 8.71 8.71 ... ## $ :List of 1 ## ..$ : num [1:335, 1:2] 8.56 8.57 8.57 8.57 8.57 ... ## $ :List of 1 ## ..$ : num [1:809, 1:2] 11.1 11.1 11.1 11.1 11.1 ... ## $ :List of 1 ## ..$ : num [1:12662, 1:2] 8.45 8.45 8.45 8.46 8.46 ... ## - attr(*, "class")= chr [1:3] "XY" "MULTIPOLYGON" "sfg" ``` --- ## Inspect your data: Spatial features Remember: Stefan already talked about the importance of the CRS. And that we personally prefer the EPSG::3035 as a standard CRS. ```r # coordinate reference system st_crs(german_states) ``` ``` ## Coordinate Reference System: ## User input: ETRS89 ## wkt: ## GEOGCRS["ETRS89", ## DATUM["European Terrestrial Reference System 1989", ## ELLIPSOID["GRS 1980",6378137,298.257222101, ## LENGTHUNIT["metre",1]]], ## PRIMEM["Greenwich",0, ## ANGLEUNIT["degree",0.0174532925199433]], ## CS[ellipsoidal,2], ## AXIS["geodetic latitude (Lat)",north, ## ORDER[1], ## ANGLEUNIT["degree",0.0174532925199433]], ## AXIS["geodetic longitude (Lon)",east, ## ORDER[2], ## ANGLEUNIT["degree",0.0174532925199433]], ## USAGE[ ## SCOPE["unknown"], ## AREA["Europe - ETRS89"], ## BBOX[32.88,-16.1,84.17,40.18]], ## ID["EPSG",4258]] ``` --- ## `st_transform` When a CRS is messed up or you want to combine data with not matching CRS, it will all go downwards from there. The good thing is that the command `st_transform` allows us to *translate* our spatial data from one coordinate reference system to another. ```r # transform crs german_states <- st_transform(german_states, crs = 3035) # check crs st_crs(german_states) ``` ``` ## Coordinate Reference System: ## User input: EPSG:3035 ## wkt: ## PROJCRS["ETRS89-extended / LAEA Europe", ## BASEGEOGCRS["ETRS89", ## DATUM["European Terrestrial Reference System 1989", ## ELLIPSOID["GRS 1980",6378137,298.257222101, ## LENGTHUNIT["metre",1]]], ## PRIMEM["Greenwich",0, ## ANGLEUNIT["degree",0.0174532925199433]], ## ID["EPSG",4258]], ## CONVERSION["Europe Equal Area 2001", ## METHOD["Lambert Azimuthal Equal Area", ## ID["EPSG",9820]], ## PARAMETER["Latitude of natural origin",52, ## ANGLEUNIT["degree",0.0174532925199433], ## ID["EPSG",8801]], ## PARAMETER["Longitude of natural origin",10, ## ANGLEUNIT["degree",0.0174532925199433], ## ID["EPSG",8802]], ## PARAMETER["False easting",4321000, ## LENGTHUNIT["metre",1], ## ID["EPSG",8806]], ## PARAMETER["False northing",3210000, ## LENGTHUNIT["metre",1], ## ID["EPSG",8807]]], ## CS[Cartesian,2], ## AXIS["northing (Y)",north, ## ORDER[1], ## LENGTHUNIT["metre",1]], ## AXIS["easting (X)",east, ## ORDER[2], ## LENGTHUNIT["metre",1]], ## USAGE[ ## SCOPE["unknown"], ## AREA["Europe - LCC & LAEA"], ## BBOX[24.6,-35.58,84.17,44.83]], ## ID["EPSG",3035]] ``` --- ## A very, VERY first map We had a look on all various kinds of information of our data set without seeing a single map. Making nice maps is an art on its own, see the upcoming session. But for inspecting the data and check if we actually loaded what we want to load, we can have a very first glimpse. ```r # plot sf object plot(german_states) ``` .center[ <img src="data:image/png;base64,#./img/plot_german_states.png" width="60%" style="display: block; margin: auto;" /> ] --- ## Import Point Layer Unfortunately, the data we want to visualize or analyze are not always available as shapefiles. Especially, point coordinates are often stored in table formats like `.csv`. The location of German hospitals can be donloaded via the hopital register of the Federal Statistical Office of Germany. They were geocoded and referecend by Anne and finally, stored in a `.csv`-file. ```r hospitals_df <- read.csv("./data/hospital_points.csv", header = T, fill = T, sep = ",") ``` --- ## From Data Table to Simple Features We see that besides our attributes (e.g., year of data collection, number of beds...), the table contains the two variables "X" and "Y" , our point coordinates. When using the command `st_as_sf` make sure to use the option `crs`. If not used, your CRS will not be defined and you won't be able to perform further commands depending on the CRS. ```r # inspect data head(hospitals_df) ``` ``` ## name year beds ags X Y ## 1 Ev.-luth. Diakonissenkrankenhaus 2017 537 1001000 4284058 3520680 ## 2 Malteser Krankenhaus St. Franziskus 2017 317 1001000 4283679 3520866 ## 3 Ameos Klinikum Kiel 2017 25 1002000 4333097 3465583 ## 4 Helios Klinik Kiel 2017 50 1002000 4328588 3466456 ## 5 Klinik Flechsig GmbH 2017 30 1002000 4328612 3468441 ## 6 Städtisches Krankenhaus 2017 678 1002000 4328612 3468441 ``` ```r class(hospitals_df) ``` ``` ## [1] "data.frame" ``` --- ## From Data Table to Simple Features Good thing that Anne was the one who geocoded the addresses, so the data are most likely stored in EPSG:3035. .pull-left[ ```r # transform to spatial data frame hospitals_sf <- st_as_sf(hospitals_df, coords = c("X", "Y"), crs = 3035) # inspect data class(hospitals_sf) plot(hospitals_sf) ``` ] .pull-right[ ``` ## [1] "sf" "data.frame" ``` <img src="data:image/png;base64,#1_2_Vector_Data_files/figure-html/plot-hospitals-1.png" style="display: block; margin: auto;" /> ] --- ## ... and the other way round You already have enough of spatial data, simple featurs, and co? Do you want to go back handling a simple data frame? Here is your chance! You can easily achieve this by removing the geometry column or set the geometry to *Null*. ```r # check class class(german_states) ``` ``` ## [1] "sf" "data.frame" ``` ```r # remove geometry german_states_df <- st_set_geometry(german_states, NULL) # check class class(german_states_df) ``` ``` ## [1] "data.frame" ``` --- ## Data Wrangling After importing the data sets, we are now ready to manipulate our data. In general, data manipulation of the attribute tables of simple feature objects can be done with all usual data manipulation commands in *R/RStudio*. We are working here with the `dplyr` package to manipulate the data frames for all regular data wrangling tasks. But if you are used to work with the base R language, feel free to do so. .center[ <img src="data:image/png;base64,#./img/tidyverse.png" width="50%" style="display: block; margin: auto;" /> <small><small>Meme found on [Reddit](https://www.reddit.com/r/Rlanguage/comments/anv1d5/my_meme_of_the_day/?utm_source=share&utm_medium=web2x&context=3) </small></small> ] --- ## Data Intro: German Districts New task means, new data! We're moving "a layer down" and look at Germany on a more fine-grained spatial level: the district. The layer is called *german_districts* and contains only one additional attribute (id). ```r german_districts <- st_read(dsn = "./data", layer = "GER_DISTRICTS") %>% st_transform(. , crs = 3035) ``` ``` ## Reading layer `GER_DISTRICTS' from data source `C:\Users\annes\Documents\gesis-workshop-geor\data' using driver `ESRI Shapefile' ## Simple feature collection with 401 features and 1 field ## geometry type: MULTIPOLYGON ## dimension: XY ## bbox: xmin: 5.86625 ymin: 47.27012 xmax: 15.04182 ymax: 55.05838 ## geographic CRS: ETRS89 ``` --- ## German Districts .small[ <img src="data:image/png;base64,#1_2_Vector_Data_files/figure-html/plot_districts-1.png" width="70%" style="display: block; margin: auto;" /> ] --- ## Data Intro: Attributes Since it would be a little bit boring to work with just one attribute, we prepared an extra table with more attributes called *attributes_districts*. .small[ You'll find here 1. Data on Covid-19 cases and deaths as of Feb 2nd 2021 (Prepared by the Robert-Koch-Institut and downloaded from the [NPGEO data hub](https://npgeo-corona-npgeo-de.hub.arcgis.com/datasets/917fc37a709542548cc3be077a786c17_0)) 2. Election Results for the German Right-Wing Populist Party *AfD* in the 2017 German federal election ([Der Bundeswahlleiter, Wiesbaden 2018](https://www.bundeswahlleiter.de/bundestagswahlen/2017/ergebnisse/weitere-ergebnisse.html)) ] ```r attributes_districts <- read.csv("./data/attributes_districts.csv", header = T, fill = T, sep = ",") ``` ``` ## district_id population cases deaths cases_per_100k cases_7days death_7days ## 1 1001 90164 1044 21 1157.8901 108 1 ## 2 1002 246794 3037 71 1230.5810 130 2 ## 3 1003 216530 3652 44 1686.6023 313 3 ## 4 1004 80196 1001 17 1248.1919 55 0 ## 5 1051 133193 1246 37 935.4846 36 0 ## 6 1053 198019 2520 57 1272.6052 137 0 ## afd_voteshare_2017 ## 1 7.449591 ## 2 6.877478 ## 3 8.792160 ## 4 10.226906 ## 5 8.494436 ## 6 10.142414 ``` --- ## Add Attributes: Join Data Table You might already spotted that we have an id for the districts in both data table. With a regular left join of our additional attributes to the `sf` object, we can enhance our spatial data object and add more and more attributes. ```r german_districts_enhanced <- german_districts %>% rename(., district_id = id) %>% left_join(. , attributes_districts, by = "district_id") class(german_districts_enhanced) ``` ``` ## [1] "sf" "data.frame" ``` ```r head(german_districts_enhanced,2) ``` ``` ## Simple feature collection with 2 features and 8 fields ## geometry type: MULTIPOLYGON ## dimension: XY ## bbox: xmin: 4279627 ymin: 3460480 xmax: 4335232 ymax: 3524426 ## projected CRS: ETRS89-extended / LAEA Europe ## district_id population cases deaths cases_per_100k cases_7days death_7days ## 1 1001 90164 1044 21 1157.890 108 1 ## 2 1002 246794 3037 71 1230.581 130 2 ## afd_voteshare_2017 geometry ## 1 7.449591 MULTIPOLYGON (((4283235 352... ## 2 6.877478 MULTIPOLYGON (((4331981 348... ``` --- ## Exercise 1 --- ## Add (More) Attributes: Spatial Join, Spatial Intersects and More Besides the regular join, we can also perform a so called *spatial join*. This is a small outlook for tomorrow on Data Wrangling and linking, The task here is rather simple: We want to count the number of hospitals in each German district. The hospital point layer was imported and is luckily already in the same CRS as polygon layer of the German districts. `st_join` does the trick. The object *hospitals_in_districts* contains the corresponding district_id for each hospital. We can add the hospital count by rearranging the data and left join again. ```r # perfom spatial join hospitals_in_districts <- st_join(hospitals_sf, german_districts_enhanced, join = st_within) # count the number of hospitals within a district hospital_districts_count <- count(as_tibble(hospitals_in_districts), district_id) %>% rename(hospital_ct = n) # optional # Join the hospital count with the german district german_districts_enhanced<- left_join(german_districts_enhanced, hospital_districts_count, by = "district_id") %>% mutate(hospital_ct = tidyr::replace_na(hospital_ct, 0)) # optional ``` --- ## Did that work? ```r tmap::tm_shape(german_districts_enhanced)+ tmap::tm_polygons("hospital_ct", n=10 ) ``` <img src="data:image/png;base64,#1_2_Vector_Data_files/figure-html/unnamed-chunk-3-1.png" width="50%" style="display: block; margin: auto;" /> --- ## Subsetting the Data You can use your usual data wrangling workflow also when subsetting the data. .pull-left[ ```r # subsetting berlin <- german_districts_enhanced %>% filter(. , district_id == 11000) %>% select(., district_id) plot(berlin) # or in Base R german_districts_enhanced[which (german_districts_enhanced$district_id==11000), "district_id"] %>% plot() ``` ] .pull-right[ <img src="data:image/png;base64,#1_2_Vector_Data_files/figure-html/stouches-1.png" style="display: block; margin: auto;" /> ] --- ## Using `sf` for subsetting You can use `st_touches` to identify all districts surrounding Berlin. Tomorrows session on applied data wrangling will feature some more spatial data wrangle examples! .pull-left[ ```r # subsetting with st_touches berlin_surrounding <- german_districts_enhanced %>% select(. , district_id) %>% filter(., lengths(st_touches(., berlin)) > 0) plot(berlin_surrounding) ``` ] .pull-right[ <img src="data:image/png;base64,#1_2_Vector_Data_files/figure-html/surround-1.png" style="display: block; margin: auto;" /> ] --- ## Export the Data After Wrangling and adjusting the data, you can save them. There are, again, several options to do so. Two notes: .small[ 1.Be careful when saving shapefiles: column names will automatically abbreviated! 2.Make sure that the CRS is included either in your folder or the file name. ] ```r # Export as Shapefile st_write(german_districts_enhanced, dsn = "./data/own_exports/districts_enhanced_epsg3035.shp", layer = "german_districts_enhanced_epsg3035.shp", # optional driver = "ESRI Shapefile", # optional delete_layer = TRUE) #optional # Export data frame as csv without geometric attributes german_districts_enhanced_df <- st_set_geometry(german_districts_enhanced, NULL) write.csv(german_districts_enhanced , "./data/own_exports/german_districts_enhanced.csv", row.names = F, na= "." ) # Export data frame as csv with geometric attributes st_write(hospitals_sf, "./data/own_exports/hospitals_epsg3035.csv", layer_options = "GEOMETRY=AS_XY") ``` --- ## Excercise 2 --- ## Wrap-Up and Lunch Break We made it through our first session dealing with vector data. You can load, transform, manipulate and export the data. The next step is producing a nice map and you already are well-equipped to present your geodata in a nice way. But, that is for after lunch! --- layout: false class: center background-image: url(data:image/png;base64,#./img/the_end.png) background-size: cover .left-column[ </br> <img src="data:image/png;base64,#./img/anne.png" width="90%" style="display: block; margin: auto;" /> ] .right-column[ .left[.small[`<i class="fas fa-envelope "></i>`{=html} [`anne-kathrin.stroppe@gesis.org`](mailto:anne-kathrin.stroppe@gesis.org)] </br> .small[`<i class="fab fa-twitter "></i>`{=html} [`@AStroppe`](https://twitter.com/AStroppe)] </br> .small[`<i class="fab fa-github "></i>`{=html} [`stroppan`](https://github.com/stroppan)] </br> ]